The Entropy of Words - Learnability and Expressivity across More than 1000 Languages

نویسندگان

  • Christian Bentz
  • Dimitrios Alikaniotis
  • Michael Cysouw
  • Ramon Ferrer-i-Cancho
چکیده

The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics, and language sciences more generally. Information-theory gives us tools at hand to measure precisely the average amount of choice associated with words – the word entropy. Here we use three parallel corpora – encompassing ca. 450 million words in 1916 texts and 1259 languages – to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present three main results: 1) a text size of 50K tokens is sufficient for word entropies to stabilize throughout the text, 2) across languages of the world, word entropies display a unimodal distribution that is skewed to the right. This suggests that there is a trade-off between the learnability and expressivity of words across languages of the world. 3) There is a strong linear relationship between unigram entropies and entropy rates, suggesting that they are inherently linked. We discuss the implications of these results for studying the diversity and evolution of languages from an information-theoretic point of view.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The word entropy of natural languages

The average uncertainty associated with words is an informationtheoretic concept at the heart of quantitative and computational linguistics. The entropy has been established as a measure of this average uncertainty also called average information content. We here use parallel texts of 21 languages to establish the number of tokens at which word entropies converge to stable values. These converg...

متن کامل

Model-Theoretic Expressivity Analysis

In the preceding chapter the problem of comparing languages was considered from a behavioral perspective. In this chapter we develop an alternative, modeltheoretic approach. In this approach we compare the expressiveness of probabilistic-logic (pl-) languages by considering the models that can be characterized in a language. Roughly speaking, one language L is at least as expressive as another ...

متن کامل

Exploring the Relationship Between Learnability and Linguistic Universals

Greater learnability has been offered as an explanation as to why certain properties appear in human languages more frequently than others. Languages with greater learnability are more likely to be accurately transmitted from one generation of learners to the next. We explore whether such a learnability bias is sufficient to result in a property becoming prevalent across languages by formalizin...

متن کامل

The Influence of Sociological Factors on Usage of Mazandarani Language among the Youth

In this research, it has been attempted to determine the social role of two languages, Persian and Mazandarani languages ​​in Qaemshahr and their influence on young people on the use of these linguistic species. In societies with more than one language, we see the collision of languages ​​in various forms. In other words, some consequences of this collision of language cause the loss of the imp...

متن کامل

A Sound Symbolic Study of Translation of Onomatopoeia in Children's Literature: The Case of '' Tintin''

As onomatopoeic words or expressions are attractive, the users of languages in the fields of religion, literature, music, education, linguistics, trade, and so forth wish to utilize them in their utterances. They are more effective and imaginative than the simple words. Onomatopoeic words or expressions attach us to the real nature and to our inner senses. This study aims at familiarity with on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Entropy

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2017